min rank | avg. rank | sentence |
---|---|---|
2340 | 10829.1429 | It-tobba jilqgħu uħud mill-miżuri, jikkritikaw infurzar “fażul” |
2097 | 19302.5000 | “Ġurnaliżmu ħieles iwassal għall-verità” |
1569 | 7462.4000 | Għandna l-kuntest ekonomiku miġjub mill-pandemija. |
1506 | 11384.5000 | Ir-rapport jipprovdi stampa inkwetanti għall-futur tal-iskejjel tat-Tagħlim tal-Lingwa Ingliża (ELT). |
1194 | 19199.1538 | Per eżempju mill- istazzjon tal-metrò propost ġo Raħal Gdid taqbad shuttle bus sal-Birgu. |
1146 | 2951.4000 | L-Ispettur Godwin Scerri mexxa l-Prosekuzzjoni. |
1032 | 4662.2000 | It-titlu ngħata direttament mill-Papa Franġisku. |
983 | 7042.8333 | Saviour imur ikellem lill-Avukat Manwel Laferla. |
969 | 9210.3750 | Santa Klaws idur f’San Pawl il-Baħar… b’karozza klassika! |
914 | 8503.1429 | Janice Xuereb (Birkirkara); Jodie Attard (Raiders Għargħur). |
914 | 1547.0000 | L-Ispettur Roderick Attard mexxa l-Prosekuzzjoni. |
912 | 15470.2857 | Jien b’tifel autistic żgur m’għandix moħħi mistrieħ. |
877 | 10533.8889 | L-informazzjoni meqjusa pertinenti għall-investigazzjoni mbagħad tniżżlet mill-espert maħtur mill-qorti. |
755 | 1470.1667 | Ġunju 2016: l-isptarijiet jgħaddu f’idejn Vitals. |
744 | 4377.8750 | Ritratti: 22 immigrant jinżammu arrestati talli “qalgħu l-inkwiet” |
722 | 1725.0000 | “Qabel jiftħu l-iskejjel, iridu jinżlu l-każi” |
705 | 10411.5714 | Erbatax -il żiemel ġrew fit-tiġrija tal-aqwa klassi. |
696 | 7383.5714 | X’se jiġri xħin Erminia tidħol ħdejn Emilio? |
671 | 8657.3750 | Il-Fondazzjoni għall-Ħarsien Soċjali qassmet 15,000 xirja lill-familji vulnerabbli. |
663 | 6196.3333 | F’din l-elezzjoni, Ellis kiseb 1,968 vot. |
635 | 6971.0000 | Marsaxlokk Brown’s Pharma għaddew għas-semifinali tan-knock-out tal-Ewwel Diviżjoni. |
635 | 5117.5000 | Minnhom tnax -il żiemel għaddew għall-finali. |
634 | 6122.7143 | Janice Xuereb (Birkirkara); Jodie Attard (Swieqi United). |
621 | 8268.4000 | Kif inhi s-sistema kurrenti taż-żjarat? |
621 | 23762.6000 | Kif sibt il-qabża mill-pjanu għall-orgni? |
606 | 19469.1250 | “Kieku kont bniedem negattiv kont ninxteħet f’rokna nibki. |
573 | 9407.1667 | Ninsabu f'awla 20, nistennew is-seduta tibda. |
573 | 6982.1667 | Sakemm imbgħad terġa’ tibda tirpilja fl-2021. |
560 | 8108.3333 | “”Għandek 140 voluntier ġo 88 karozza. |
557 | 4493.1429 | Axel Tilly (Michael Ellul) temm fir-raba’ pożizzjoni. |
In contrast to subsection 4.5.2.1 we now search for sentences consisting of rare words only. The sentences are ordered by the rank of the most frequent word in a sentence. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The sentences are forced not to contain any everyday word. As a consequence, we get either sentences of some very reduced structure or sentences in some foreign language. Hence, the data are useful for the evaluation of the preprocessing, especially language detection.
select min(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m desc limit 30;
Should we remove the sentences having its least frequent word above some threshold?
4.5.2.1 Maximum word rank in sentence
4.5.2.2 Average word rank in sentence
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II